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A  SAMPLE  REUSE  METHOD  FOR  ACCURATE 
PARAMETRIC  EMPIRICAL  BAYES  CONFIDENCE  INTERVALS 


Bradley  P.  Carlin  and  Alan  E.  Gelfand 

ABSTRACT 

Parametric  empirical  Bayes  methods  of  point  estimation  for  a  vector  of  unknown 
parameters  date  to  the  landmark  paper  of  James  and  Stein  (1961).  The  usual  approach 
is  to  use  the  mean  of  the  estimated  posterior  distribution  of  each  parameter,  where  the 
estimation  of  the  prior  parameters  ("hyperparameters")  is  accomplished  through  the 
marginal  distribution  of  the  data.  While  point  estimates  computed  this  way  usually 
perform  well,  interval  estimates  based  on  the  estimated  posterior  (called  "naive"  EB  in¬ 
tervals)  arc  not.  They  fail  to  account  for  the  variability  in  the  estimation  of  the 
hyperparameters,  generally  resulting  in  sub-nominal  coverage  probability  in  the  "EB" 
sense  defined  in  Morris  (19S3a). 

In  this  paper  we  extend  the  work  of  Carlin  and  Gelfand  (19S9),  who  proposed  a 
conditional  bias  correction  method  for  developing  EB  intervals  which  corrccrs  the  defi¬ 
ciencies  in  the  naive  intervals.  We  show  ho*v  bias  correction  can  be  implemented  in 
general  via  a  Type  111  parametric  boostrap  procedure,  a  sample  reuse  method  first  em¬ 
ployed  by  Laird  and  Louis  (19S7).  Theoretical  and  simulation  results  indicate  that  in¬ 
tervals  which  arc  accurate  with  respect  to  nominal  coverage  ensue.  We  give  two  specific 
applications  (to  binomial  test  data  and  Poisson  failure  rate  data)  where  we  compute  si¬ 
multaneous  point  and  bias  corrected  interval  estimates. 

KEY  WORDS:  Confidence  interval;  parametric  empirical  Bayes;  parametric  bootstrap; 
bias  correction;  conditional  calibration. 
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I.  INTRODUCTION 


In  this  paper  we  consider  the  problem  of  multiparamctcr  interval  estimation  in  the 

para  men  ic  empirical  Bayes  (RUB)  framework.  We  consider  the  familiar  exchangeable 

model,  where  at  the  first  stage,  given  0,  the  data  vectors  X  arc  independently  distributed 

as  /  (v',1 0,),  /  =  I . p.  At  the  second  stage,  the  0,  are  supposed  i.i.d.  with  distribution 

a(0 !  tj)  over  0,  where  >/  indexes  the  family  a.  The  marginal  distribution  of  X  is 

*=  J/(i'.l0,)  n(0,|>/)</0, ,  and  the  conditional  independence  structure  of  our  model 

implies  that  marginally  the  X  arc  independent.  Thus  the  joint  marginal  distribution  of 

/ 

all  the  data  is  wifylij)  =»  n /rr,(v.  1  r/).  The  posterior  distribution  of  0,  depends  on  the  data 

•*t  '■* 

only  through  j/,  and  is  denoted  by/ (0,1^,,  ij) ,  though  in  the  sequel  we  suppress  the  sub¬ 
script  on  /to  simplify  the  notation. 

If  i\  were  known  then  point  or  interval  estimates  for  0,  would  be  computed  via  the 
posterior  .  Generally,  however,  tj  is  unknown.  A  pure  Bayesian  approach 

would  place  a  third  stage  prior  (also  known  as  a  "hyperprior")  t(jj)  on  tj  and  then  base 
inference  about  0,  on  the  "marginal  posterior," 

0-0 

where  /»(»/ 1  y)  oc  .  The  hyperprior  t(>j)  is  often  given  vague  specification. 

Usually  computation  of  (1.1)  is  an  arduous  task.  In  recent  years  substantial  effort  has 
been  devoted  to  developing  methodology  to  calculate  the  distribution  ( 1 . 1)  and  its 
characteristics,  (see  for  example  Naylor  and  Smith  19S2,  Tierney  and  Kadanc  1986, 
Smith  ct.  al.  1985,  Smith  ct.  al.  19S7,  and  Gclfand  and  Smith  19S9). 

An  alternative  is  the  PEB  approach,  which  treats  >/  as  a  fixed  unknown,  and  replaces 
the  integration  over  rj  in  (1.1)  with  estimation  of  tj  (usually  via  maximum  likelihood) 

A  A 

from  the  marginal  distribution  m(y|i/),  obtaining  =  tj( y) .  Inference  is  based  upon  the 
"estimated  posterior," 
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Note  that  (1.2)  may  be  considered  an  approximation  to  (1.1)  of  order  in  the  sense 
that,  under  mild  regularity  conditions. 

£(s(0i)ly)=«s(0i)b.3)ti  +  0(/>-')D  (1.3/ 

(Kass  and  Stcficy,  19S8).  Expression  (1.3)  formalizes  the  fact  that  PEB  estimates  arc 
approximately  fully  Bayesian  posterior  means. 

There  is  also  a  substantial  amount  of  literature  which  demonstrates  that,  as  an  esti¬ 
mator  of  g(0,),  £(g(0,) I  v,,  tj )  often  performs  well  in  a  decision  theoretic  sense  (see  Morris, 
1983a  for  a  summary).  Unfortunately,  interval  estimation  of  0,  through  (1.2)  has  been 
less  successful;  such  "naive"  confidence  intervals  based  on  appropriate  percentiles  of  the 
estimated  posterior  (either  highest  posterior  density  or  "equal  tail"  intervals)  arc  gener¬ 
ally  too  short,  and  hence  fail  to  attain  the  nominal  coverage  probability.  The  explana¬ 
tion  for  this  problem  from  a  PEB  point  of  view  is  that  we  are  ignoring  the  variability  in 
* 

yj  ;  from  a  Bayesian  point  of  view,  that  we  arc  ignoring  the  posterior  uncertainty  about 
yj.  More  precisely 

Var(0,\y)  =  F^U'oriO^, ,  >/)]  +  ,  *)]  ,  (1.4) 


and  so  the  variance  estimate  based  on  (1.2),  f'ar(0t(£ ,  tj) ,  will,  according  to  (1.3),  only 
approximate  the  first  term  in  (1.4).  Morris  (1983b,  1987)  develops  improved  approxi¬ 
mations  to  (1.4)  in  special  eases  while  Kass  and  Steffey  (1988)  give  a  general  first  order 
approximation. 

We  propose  a  more  direct  attack  on  the  interval  estimation  problem.  We  first  for¬ 
malize  our  objective,  nominal  conditional  or  unconditional  EB  coverage.  We  then  de¬ 
scribe  how  to  correct  the  bias  in  the  naive  interval  estimate  to  meet  the  objective.  Most 
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importantly,  we  show  that  for  many  interesting  problems  implementation  of  the  bias 
correctin'*  can  be  accomplished  via  a  sample  reuse  method.  In  particular  we  employ  the 
Type  111  parametric  bootstrap  introduced  by  Laird  and  Louis  (1987),  applying  it  to  two 
discrete  exponential  family  data  sets. 

In  the  KB  framework  (Morris  1983a, b,  Hill  19SS)  a  statistical  model  is  appealingly 
specified  as  a  collection  of  joint  probability  distributions  indexed  by  some  parameter, 

(1.5) 

A  member  of  this  famih'  is  expressible  in  the  sampling  form  as  p,(y,0) « /(y!0)  a(0|>/) , 
or  in  the  inferential  form  as  pjy,  0)  **f{0\y,  ij)  m(ybi)  .  Performance  of  an  inference 
procedure  is  evaluated  over  the  variability  inherent  in  both  0  and  the  data.  Thus  an 
unconditional  KB  confidence  set  of  size  1  -  a  for  g(0)  is  a  subset  t,{  Y)  of  0  such  that 

inT /»,{«(«  «OV)J  St  !-«•  (1.6) 

p 

Kquation  (1.6)  has  been  criticized  as  being  a  weak  statement.  First,  we  would  likely 
prefer  P,{g(0)  e  r.(Y)}  =  1  -  a  over  all  distributions  in  p.  Second,  we  would  likely  prefer 
a  probability  statement  which  offers  conditional  calibration  given  some  appropriate  data 
summary  (Rubin,  I9S4).  For  instance  a  pure  Bayesian  interval  would  be  based  upon 
/(0,|y),  and  so  is  conditionally  calibrated  given  all  the  data.  We  propose  modifying  (1.6) 
to  a  conditional  statement  by  integrating  instead  over  the  distribution  /?(0„  y  I  b( y), ;/)  for 
a  suitable  statistic  £>( Y) .  That  is,  r,(Y)  is  a  conditional  1  -  a  EB  confidence  set  for  g(0) 
given  b{ Y)  if,  for  each  b( y)  =  b  and  y  , 

^{g(0)e/a(Y)|i(y)  =  6)  =  !-a.  (1.7) 

In  Section  2,  we  review  the  “bias  correcting”  approach  (given  in  Carlin  and  Gclfand, 
1989)  for  obtaining  intervals  achieving  this  sort  of  coverage  probability  using  various 
choices  of  b{ Y).  We  then  settle  on  £>(Y)  =  £  as  a  natural  choice,  and  elaborate  on  sam- 
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pic  reuse  methods  to  implement  ihe  bias  correction.  Section  3  discusses  the  mechanics 
of  hyperparameter  estimation  and  EB  point  and  interval  estimation  oP0,  in  the  context 
of  two  discrete  models,  the  binomial  and  the  Poisson.  For  illustration,  the  analysis  of 
two  data  sets  is  given  in  Section  4.  Finally  in  Section  5  we  summarize  and  comment  on 
some  remaining  problems. 

2.  PEB  MODELLING  AND  BIAS  CORRECTION 

Adopting  the  EB  framework  we  define  the  concept*  of  sufficiency  and  ancillarity  (see 
Mill,  19S8)  for  the  model  (1.5). 

Definiti  n  I.  A  statistic  T  is  sufficient  for  g(0)  in  (1.5)  if  and  only  if 

•  T  is  sufficient  for  g(0)  in  the  posterior  family,  i.e., /(g(0)  I /,  .*/)  *“/(g(0)  I  >\  *l)  for  all 
ijeH,  and, 

•  T  is  sufficient  for  >/  in  the  marginal  family,  i.e..  m  (y If,  >/)  “  nt  (y 1 0  for  all  >/  e  H  . 

For  example,  under  the  exchangeable  structure  of  Section  1 ,  if  g(0)  =  0,  ,  then  usually 

r-o.i). 

Using  this  definition  it  is  straightforward  to  show  that  T  carries  all  the  information 
about  >/  in  the  model  with  respect  to  inference  about  g(0),  i.e.  we  may  replace  Y  by  Tin 

4, 

(1.6)  and  (1.7).  In  particular  for  0,  we  need  only  study  intervals  based  on  Z  and  ?/. 

Definition  2.  A  statistic  A  is  ancillary  forg(0)  in  (1.5)  if  and  only  if  T=  (jj,  a )  is  minimally 
sufficient  for  g(0),  where  rj  is  an  estimate  of  rj,  and  /?(a|r/)  =  p{a)  for  all  t\  e  H. 

Thus  if  A  is  ancillary,  its  distribution  with  respect  to  the  marginal  family  is  free  of »/.  In 
this  definition,  minimal  sufficiency  is  with  respect  to  (1.5)  using  Definition  1. 


In  the  sequel  we  take  g(0i  -  0,  and  let  [Cl  13  be  a  generic  notation  for  the  distrib¬ 
ution  of  l'  gi\cn  V.  To  strengthen  the  statement  (1.6)  suppose  we  replace  the  inte¬ 
gration  over  [0,1.  »;!>;]  in  (1.6)  with  an  integration  over  [0,,£, »/ 1 6(Y).  >/j.  If  we  take 
MY)  «=*  Y  then  we  arc  in  faet  integrating  over  [0,l£,  >/3.  the  posterior  distribution  of  0,. 

A 

If  we  choose  MY)  HD-ancillary,  then  typically  (/»,  >/,£,)  are  such  that  fixing  any  two  of 
them  determines  the  third.  This  means  that  usually  [0„£, 

=  [0,, XI >/,M  >/]•[>;  I M>/]  »[0.l£.  >/]*[>/ 1 >/].  where  we  have  used  Basu's  well-known 
theorem  in  the  last  step.  Thus  if  we  can  find  an  HD-ancillary  statistic,  we  can  develop 
intervals  with  HB-covcragc  conditional  on  the  ancillary  merely  by  integrating  first  over 

A 

the  full  posterior  and  then  over  the  sampling  distribution  of  our  estimator  rj.  However 
in  the  discrete  cases  we  study,  and  in  fact  rather  generally,  exact  ancillary  statistics  arc 
not  available.  We  have  not  investigated  the  notion  of  approximate  ancillarics.  If  we 
instead  take  MY)  “  £,  then  our  integration  is  over  [0(,  tj  |£,  =  [0,|£,  >/]  •  [i/ |£,  >/]  . 

This  choice  of  b  is  appealing  since  £  is  sufficient  for  0,  in  the  posterior  family.  Addi¬ 
tionally,  this  conditioning  is  straightforward  to  implement. 

If  we  denote  the  a,h  quantile  of  the  posterior /(0,|  v„  >/)  by  q,{ t*,  >/),  then  the  so-called 
equal  tail  "naive'  ED  interval  can  be  written  as 

A  A 

»/).  >1))  (2-1) 

As  has  already  been  noted,  intervals  of  this  type  fail  to  satisfy  (1.6),  and  their  coverage 
conditional  on  £  is  also  poor.  Most  of  the  work  in  the  area  of  OB  confidence  intervals 
has  focused  on  "lengthening"  these  intervals  by  using  the  marginal  posterior  (1.1),  either 
exactly  or  approximately.  Declv  and  Lindlcy  (l‘>81)  and  Rubin  (1982)  perform  the  nu¬ 
merical  integration  and  compute  4  directly.  Morris  (198?)  advocates  use  of  the  member 
of /(0,li’„  jj)  whose  first  two  moments  agree  with  the  fust  two  (estimated)  moments  of 
Laird  and  Louis  (19S7)  suggest  the  use  of  a  parametric  bootstrap  sampling  method 
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(which  wc  discuss  in  some  detail  below*  10  approximate  (1.1)  where  h  is  taken  to  he 
p(»)|»/i.  the  sampling  density  of  >j,  with  the  arguments  interchanged. 

In  our  view,  the  weakness  of  these  approaches  is  that  they  fail  to  address  the  salient 
issue,  the  existence  and  nature  of  an  h  which  will  be  successful  in  achieving  nominal  I;i) 
coverage.  Put  simply,  they  arc  all  concerned  with  "matching"  /„  in  (1.1),  without  speci¬ 
fying  how  to  choose  h  to  begin  with.  This  is  understandable,  as  the  whole  approach  is 
not  directly  aimed  at  attaining  a  specified  level  of  UB  coverage  (cither  conditional  or 
unconditional),  furthermore,  while  the  lengthening  of  the  naive  intervals  created  by  the 
"mixing”  in  (1.1)  is  usually  desirable,  if  our  estimator  q  is  badly  biased,  the  naive  intervals 
can  actually  turn  out  to  be  too  long.  The  real  issue  is  how  to  correct  the  bias  in  (2.1). 
A  direct  approach  which  would  be  applied  to  each  tail  separately  is  as  follows: 

Suppose  wc  define 

kJ.  *  * «) = no,  <.  i  «r/(«il£.  >/)i  p-2) 

and 

/?(»/. iV*  a)  =  E V.  M>/.  >/.&,  a)) •  t2-3) 


If  wc  then  solve 


R{*l,  A).  «')*  «  (2-4) 

A 

for  a'  ■=  a),  wc  conditionally  "correct  the  bias”  in  using  t\.  Of  course  (2.4)  is  not 

A 

solvable  as  it  stands  since  tj  is  unknown.  Using  tj,  wc  propose  to  obtain  a  bootstrap 

A 

estimate  of  the  left  hand  side  of  (2.4)  and  thus  solve  instead  for  «'(>/,&. «)  •  Wc  correct 
both  the  lower  and  upper  percentiles  of  our  naive  interval,  obtaining  at'  and  a„'t  and 
then  take 
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lr.-  »/ll 


(2.5) 


as  our  bins  corrected  interval  for  0  . 


Carlin  and  Gclfand  ( 10S*>!  shovveJ  that  this  bias  corrected  confidence  interval  is 
unique  provided  t'r/ca  exists.  They  further  showed  that  if  the  distributions 
f{Q, l,y.,r/)  and  gb/l^.i/)  are  stochastically  ordered  in  rj,  then  conditional  bias  correction 

0  a 

"works-  with  respect  to  (1.7),  i.c.  rfjj,  j/.jy,  a'(>/,v„  a))  ,  the  expected  conditional 
bias  corrected  tail  probability,  falls  in  an  interval  containing  a  (for  some  t],  it  will  be 
slightly  more  than  a;  for  others,  slightly  less).  They  also  provide  simulation  support  in 
several  elementary  eases.  Note  that  while  we  describe  bias  correction  of  the  naive  in¬ 
terval  (2.1)  we  could  just  as  well  have  chosen  to  bias  correct  intervals  resulting  from 
(1.1).  Such  correction  could  be  used  to  possibly  obtain  shorter  intervals  achieving  de¬ 
sired  coverage  levels. 


Let  us  briefly  investigate  the  mechanics  of  bias  correction.  If  n{0\t])  is  chosen  as  the 
standard  conjugate  prior  for/^y.  1 0.)  then  of  course  the  posterior  tj)  belongs  to  the 
same  standard  family,  and  r  in  (2.2)  takes  the  relatively  simple  form 


r(’l,  »/.£,.  a> 


(2.6) 


where  F  is  the  posterior  c.d.f.  To  compute  R  in  (2.3)  necessitates  integrating  over  the 
distribution  g(?j  1^,,  rj) .  If  g  is  available  in  closed  form  this  will  require  a  numerical  in¬ 
tegration  (either  Monte  Carlo  or  other  quadrature  method).  Then  we  compute  the 

A  A 

marginal  MLE  >/  and  solve  equation  (2.4)  at  ?/  =  ij  for  a'  via  rcgula  falsi.  This  replace¬ 
ment  of  an  unknown  population  parameter  by  its  estimate  from  the  marginal  empirical 
c.d.f.  is  referred  to  as  a  "parametric  bootstrap"  by  Laird  and  Louis  (1987). 

Examples  where  tills  procedure  can  be  carried  out  efficiently  and  easily  are  discussed 
in  Carlin  and  Gclfand  ( 19S9).  However,  in  many  eases  (in  particular  the  discrete  settings 
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* 

in  Section  ?i  a  closed  form  for  yu/ I  r„  >j)  is  unavailable.  In  fact,  in  several  elementary 
conjugate  situations,  the  marginal  MIX  »/  itself  has  no  closed  form  and  can  only  be 
computed  numerically.  In  such  cases  wc  estimate  ROh&,  al  through  the  use  of  a  Type 
111  parametric  bootstrap."  a  sample  reuse  method  introduced  by  Laird  and  Louis  (19S7) 
and  modified  here  for  the  case  of  OB  coverage  conditional  on  X* 

Let  us  first  review  the  unconditional  version.  To  estimate  expectations  under  the 
sampling  density  of  n.pvibj).  we  obtain  >/*  from  the  distribution  p(*  l>j)  us  follows. 
Draw  0J, ... ,  0;  i.i.d.  from  n(0|>/) ,  then  draw  X  independently  from /(H0;),  /«  I 
and  finally  compute  >/'  from  the  "pscudodata“  {JL!T>  in  the  same  way  that  >/  was  computed 
from  the  data  IX)  .  Concisely,  wc  have 

■*/  -♦  (0| }  ”*  (X)  **♦  >/  (2-7) 


For  correction  conditional  on  X.  wc  modil>-  the  Laird  and  Louis  procedure  in  order 
to  draw  observations  from  g  rather  than  p  by  changing  (2.7)  to 


A 

jj- 


{0k,  (X*.  A  &  /}  -*  =  >7  (V|.  US,  A  *  /)) 


(2.S) 


M  . 

Repeating  this  process  .V  times  wc  obtain  I £,,  >?)../ I,...,  A',  and  our  Type  III 

parametric  bootstrap  estimate  of  /*(//,£„  «')  becomes 


S  .  A 

«')  /Ar. 


(2.9) 


Wc  equate  (2.9)  to  o  ,  and  solve  for  a'  by  rcgula  faisi  as  above.  Note  that  since  only  the 
X  arc  needed  to  calculate  if,  if  m(v,  \tj)  can  be  sampled  from  directly  wc  an  omit  the 
generation  of  the  0;  (sec  Example  2.2). 

The  Type  III  parametric  bootstrap  can  obviously  be  modified  to  implement  bias 
correction  conditional  on  b{ Y)  other  than  X>  1  lowcvcr  the  theoretical  results  below  (2.5) 
arc  only  established  given  X- 
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3.  SPECIFIC  DISCRETE  MODELS 


We  now  applv  the  preceding  methodology  to  perhaps  the  two  most  common  discrete 


models. 


Example  3.1:  Binomial-beta.  Suppose  the  data  vectors  X.  arc  simply  vectors  of  zeroes 
and  ones,  each  clement  corresponding  to  a  success  or  failure  on  the  independent 
Bernoulli  trial  in  the  iVH  population,  j «„  i»l . p.  At  the  first  stage  of  the  hi¬ 

erarchy.  we  assume  the  probability  of  success  on  any  given  trial  to  be  the  same  for  all 
trials  within  the  population,  but  possibly  different  across  populations.  By  sufficiency 

wc  reduce  to  A*  “I and  so  A’|0,~  Bin(«,0,)  .  Under  the  conjucatc  prior 
j«i  * 

i 

fl.h/-  Beta  (a,  b),  /«=  l . p  .  where  >/*=(«,  b),  a,b>  0,  the  posterior  distribution  is 

f{0,\x„  a,  (»)“  Beta  (n  +  .v,,  &  +  //,-. c.)  ,  and  the  marginal  distribution  is 

«j(x,|  a,  b)  =*  +  .v„  b  +  n,  -  x,)IB[a,  b ),  where  #(*,*)  represents  the  beta  function. 

The  marginal  likelihood, 


L(a,  b)  <=  m(il  a,  b)  =  11  ("')/»(«  +  xh  b  +  nt  -  x[)jD[at  b), 
/■i  * 


a  #.  A 

is  maximized  via  numerical  methods  to  produce  the  marginal  MLE, «  » (a,  b).  The  naive 
EB  confidence  interval  (2.1)  is  computed  as  (Y »  (a/2),  Y  ;•  .  (1  -  a/2))  , 

where  YUiA  is  the  c.d.f.  of  a  beta  distribution  with  parameter  c  and  d. 

To  implement  the  bias  correction  wc  note  that  (2.2)  becomes 

rln, . x„  «)  ■ =  r, j  +  nj .  XJW)  (3.2) 

which  is  available  numerically  using  a  scientific  subroutine  library.  Using  the  Type  III 
parametric  bootstrap  procedure  (2.9)  becomes 
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which  wc  equate  to  a  and  solve  for  a'.  Setting  a  to  the  desired  nominal  level  (c,g., 
a  =  (.05,  .95)  for  90%  conditional  1:B  coverage),  wc  solve  for  the  corresponding 
(ar/«  V).  a»d  compute  our  interval  for  0,  using  (2.5).  Note  that  wc  bias  correct  each 

0,-intcrval  separately,  i «  1 . p,  but  of  course  the  correction  in  each  ease  depends  on 

the  data  through  (a,  b).  If  wc  desire  intervals  corrected  only  for  unconditional  EB  cov¬ 
erage,  our  bootstrap  equation  becomes 


,v 

r  y  . 

p I  la  +x',b  +n,-xj 


(3.4) 


Note  that  (3.4)  differs  from  (3.3)  only  in  replacing  the  given  value  x,  by  the  bootstrapped 
a-’  *$.  Now  wc  arc  "averaging  over  A',"  as  in  (1.6).  rather  than  conditioning  on  A',.  Wc 
must  still  bias  correct  each  0,-intcrval  separately,  unless  wc  have  a  balanced  experiment 
(all  n,  equal).  In  this  case  the  a-/s  arc  marginally  exchangeable,  hence  so  arc  the  a-,‘  *s, 
and  so  wc  need  only  solve  (3.4)  once  for  (at\  au')  before  using  (2.5). 

Example  3.2:  Potsson-gamnu.  Suppose  wc  have  data  of  the  form  (a-„  /,),  /=  1, ...  ,p, 
where  the  x,  arc  observed  counts  during  the  time  interval  (0,  i).  For  example,  the  data 
might  be  calls  arriving  at  p  different  switchboards  in  the  same  :ounty.  Wc  assume 

■nd 

A' Id,  -  Poisson  (0,0, /  *=  I where  the  i,  arc  known  "time  exposures."  Under  the 
conjugate  pnor,  0,|jj-*  Gamma  (a,  b)  (again  tj^{a,b),  a,b>  0  ),  the  posterior  distrib¬ 
ution  0,1  at,  is  Gamma  (n  +  x, ,  (r,+  I  /0)->) ,  and  the  marginal  distribution  of  A'  is  Nega¬ 
tive  Binomial,  i.c., 


m(xl\a,b)  = 


(wU  /,  w  Mb  y 
'  “-1  /,+  lib  '  V  t,+  Mb  ' 


(3.5) 


^  A  A 

If  (a,  b )  maximize  the  marginal  likelihood  [a  and  b  arc  not  available  in  closed  form;  sec 
Section  4),  then  the  naive  EB  confidence  interval  for  0,  is 


ll 


(oi'+„(»/2VPt«,+  !«•)].  “«m 2«<+  I/M]). 


(3.6) 


where  /),  denotes  the  c.d.f.  of  a  chi-square  distribution  with  A  (not  necessarily  integer) 
degrees  of  freedom. 

in  this  ease  (2.2)  becomes 

oWtd.  +  i/M  / «,  +  i/M]  o,;'+,jW).  (3-7) 

for  l:B  correction  conditional  on  .V, «***,,  analogous  to  (3.3),  the  Type  HI  parametric 
bootstrap  (2.9)  becomes 

j,°2<WC(/'  +  »/*;»  ^+JC, ,(«•)}  /at  (3.S) 

which  we  equate  to  a  and  solve  for  a'.  For  unconditional  correction  analogous  to  (3.4) 
we  have  the  equation 

/'v  - «  0-»> 

The  remark  after  (2.9)  reminds  us  that  in  this  ease  we  would  generate  negative  binomial 
Ay's  directly.  In  addition,  if  t,  =  t  for  all  /,  we  need  only  solve  (3.9)  once  for  (ft|/,au') 
before  using  (2.5;. 

In  this  example  if  we  take  the  gamma  scale  hyperparameter  b  to  be  known  (say 
b  —  1  w.i.o.g.),  and  if  we  assume  /,  =  /{  =  1  w.l.c.g.),  then  the  marginal  family  (3.5)  is 

Negative  Binomial  (a,  1/2).  The  method  of  moments  (instead  of  maximum  likelihood) 

estimator  of  a  is  a  =  A’  =  EA 'Jp  *md  the  distribution  of  a\x„  a  follows  from  writing 

l»  1 

a-(lV+  A'Mp  where  IF=  EA',~  Negative  Binomial  ( a{p —  1),  1/2).  Thus  we  can  inte¬ 
grate  (2.4)  directly,  avoiding  the  Type  III  resampling  algorithm. 


4.  DATA  EXAMPLES 


Wc  illustrate  the  implementation  of  ihc  bias  corrected  naive  l:B  confidence  interval 
approach  with  two  discrete  data  sets.  Results  of  several  simulation  studies  evaluating 
coverage  probabilities  and  interval  lengths  arc  discussed  in  Carlin  (I9S9). 

The  data  set  in  Table  1  comes  from  Burton  and  Turvey  (198S),  and  gives  the  results 
of  a  psychological  evaluation  of  six  independent  subjects.  Each  subject  was  given  three 
hollow  wooden  balls,  two  of  which  contained  a  small  pyramid  and  the  third  either  a 
hemisphere,  a  block,  a  cylinder,  or  a  cone.  On  each  of  the  four  resulting  trials,  the 
subject  had  to  guess  which  ball  was  not  like  the  others  simply  by  shaking,  turning,  etc. 
Table  1  gives  the  results  of  this  (balanced)  experiment,  where  .V,  is  the  number  of 
questions  answered  incorrectly  by  subject  /,  and  r,  is  the  raw  failure  rate,  which  of  course 
is  also  the  usual  classical  point  estimate  of  0„  the  true  failure  rate  for  subject  /. 

(Insert  Table  1  about  here) 

In  terms  of  modelling  this  experiment,  our  binomial-beta  model  seems  natural.  At 
the  first  stage  (given  0,),  a  subject's  four  responses  could  reasonably  be  assumed  to  be 
i.i.d.  Bernoulli  trials,  and  the  beta  provides  a  broad  choice  for  the  second  stage  distrib¬ 
ution  of  the  { 0 ,}  .  In  fact,  since  our  prior  belief  is  that  the  questions  arc  relatively  easy, 
the  simpler  family  Bcta(  1 ,  b),  b  >  1  seems  adequate.  Finally  this  data  set  benefits  sub¬ 
stantially  from  empirical  Bayes  modelling,  since  the  small  amount  of  information  on 
each  subject  severely  limits  frequentist  inference. 

(Insert  Table  2  about  here) 

The  results  of  our  analysis  arc  given  in  Table  2.  Since  wc  arc  in  a  balanced  (i.i.d.) 
case,  the  results  for  subjects  1  and  2  (who  both  answered  two  questions  incorrectly)  arc 
identical,  as  are  those  for  subjects  4,  5  and  6  (all  of  whom  made  no  mistakes).  Note  the 
familiar  shrinking  of  the  classical  point  estimates  toward  the  overall  proportion  of 
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questions  answered  incorrectly  (5, 2d  ®  .208)  by  the  KB  point  estimator.  Classical  con¬ 
fidence  intervals  (based  on  Fisher's  exact  test  statistic  for  0,)  and  naive  KB  intervals  arc 
shown,  along  with  versions  that  arc  bias  corrected  unconditionally  (via  (3.4))  and  con¬ 
ditionally  (via  (3.3))  for  nominal  coverage  y  =  .90.  We  chose  N  «  1000  bootstrap  reps 
in  solving  these  two  equations,  using  FORTRAN'  augmented  by  the  IMSL  subroutine 
library.  One  can  see  that  both  types  of  bias  corrected  intervals  have  lengths  that  arc 
between  those  of  the  classical  and  naive  EB  methods.  However,  since  nominal  condi¬ 
tional  coverage  implies  nominal  unconditional  coverage,  neither  bias  corrected  interval 
can  be  uniformly  shorter.  In  particular  we  see  that  the  conditional  intervals  arc  shorter 
when  ,V“  2  ,  but  that  the  unconditional  intervals  arc  shorter  when  A' «  0. 

The  data  presented  in  Table  3  record  numbers  of  pump  failures,  A',  observed  in 
thousands  of  hours,  t,  ,  for  p*=  10  different  systems  of  a  certain  nuclear  power  plant. 
The  observations  arc  listed  in  increasing  order  of  raw  failure  rate  r,  =»  A',/r„  which  again 
is  the  classical  point  estimate  of  the  true  failure  rate  0,  for  the  system.  This  data  ori¬ 
ginally  appeared  in  Worlcdgc,  Stringham,  and  McClymont  (19S2),  and  was  subjected  to 
an  empirical  Bayes  analysis  by  Gaver  and  O’Muirchcartaigh  (1987). 

(Insert  Table  3  about  here) 

Our  approach  is  that  of  Example  3.2  above,  using  the  conjugate  Gamma (a,b)  prior 
for  0,.  Gaver  and  O'Muirchcartaigh  also  explore  this  approach,  but  after  computing  the 
estimated  posterior  (also  gamma,  of  course)  in  the  usual  way,  they  obtain  an  approxi¬ 
mate  EB  confidence  interval  for  Q,  by  assuming  that  the  posterior  distribution  of 
c,  -  log(0,)  is  approximately  normal.  Thus  their  EB  point  estimate  for  6,  is  cxp(/),),  and 
their  naive  90%  EB  confidence  interval  for  6,  is  given  by 

(cxptf,-  1.645a/) ,  cxp(£/+  1.645$/)),  (4.1) 

1< 


where  ji,  and  a,  are  ihc  mean  and  standard  deviation,  respectively,  of  the  (log-gamma) 
estimated  posterior  for  e..  (Actually,  the  authors’  concern  about  conjugate  priors' 
overshrinkage  of  outliers  leads  them  to  prefer  a  heavier-tailed  log-Studcnt's  t  prior  on 
c„.  However,  they  conclude  that  this  assumption  docs  not  greatly  afreet  the  results.  To 
make  a  fair  comparison,  we  will  compare  our  intervals  only  with  their  gamma-based  in¬ 
tervals.) 

Solving  (3.8)  and  (3.9)  involves  computing  N  MLE  vectors  [ti},b't)%  one  for  each 
bootstrapped  pscudodaia  sample.  In  order  to  expedite  the  maximization  Gaver  and 
O'Muirchcartaigh  suggest  using  as  starting  values  crude  moments  estimators  obtained 
by  equating  the  first  two  sample  moments  of  the  crude  rates,  F  and  jJ,  to  the  corre¬ 
sponding  moments  in  the  marginal  family,  namely,  £(r,)  ■*  £(.Y,)//,  =»  nb  and 
Var{r,)  «=>  l'or(.V,)//; «  abU,  +  ab: .  This  results  in 

and 

Wl  =  0?  -  rr'yf  (4.3) 

where  /"•  =  I  (These  estimators  do  roi  exist  if  s*  £  rr '  ,  whence  any  other  rca- 

i*i 

sonablc  values,  possibly  the  "parent"  values  (a,  b ),  can  be  used.)  Our  algorithm  begins 
with  Newton-Raphson.  If  it  diverges,  we  use  a  local  grid  search  to  find  a  better  (higher 
likelihood)  place  to  restart.  We  iterate  grid  search  and  Newton-Raphson  up  to  twenty 
times,  finally  giving  up  and  regenerating  new  pscudodata  if  the  algorithm  still  fails.  We 
were  able  to  keep  the  "failure  rate"  under  1%  using  this  algorithm. 

Gaver  and  O'Muirchcartaigh  felt  that  the  intervals  (4.1)  were  likely  to  be  too  short 
due  to  hyperparameter  estimation.  Their  ad  hoc  remedy  was  to  compute  approximate 

A  A 

joint  95%  confidence  regions  for  their  computed  values  of  (a  =  .8223,  b  —  .7943)  using 

A  * 

a  chi-squared  likelihood  ratio  technique.  They  then  searched  over  all  (a,  b)  in  this  re- 
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gion,  taking  the  largest  value  of  cxp(/i,+  l.<M5o,)  ohtained  as  their  ’‘corrected"  upper 
confidence  limit  for  0,.  (A  similar  procedure  could  have  been  undertaken  for  the  lower 
limit,  but  was  not  since  the  authors'  interest  was,  understandably,  only  in  how  large  the 
failure  rates  could  be.)  While  this  process  certainly  docs  increase  the  upper  confidence 
limit,  it  is  not  clear  what  confidence  level  to  attach  to  it. 

(Insert  Table  4  about  here) 

Table  4  gives  the  results  for  observations  1,  5,  6,  and  10  using  the  classical  interval, 
(1/2*  Dr/fa/i),  1/2  •  ,„(1  — a/2 )),  applied  to  the  raw  rates,  the  naive  and  corrected 

Gaver  and  O'Muircheartaigh  (G  &  0)  methods,  the  naive  EB  interval  (2.1),  and  the 
conditional  and  unconditional  bias  corrected  methods  (3.S)  and  (3.9)  above.  Bias  cor¬ 
rected  percentile  (a')  values  used  in  these  methods  are  shown  in  parentheses  below  the 
corresponding  interval  endpoint. 

Some  conclusions  arc:  while  the  classical  point  estimates  seem  all  right,  the  corre¬ 
sponding  interval  estimates  arc  very  wide.  The  naive  G  &  0  point  estimates  have  been 
uniformly  shrunk  closer  to  zero  than  the  regular  EB  point  estimates,  while  the  corre¬ 
sponding  upper  confidence  limits  arc  uniformly  further  firem  zero.  The  naive  G  &  O 
interval  estimates  arc  not  necessarily  too  short;  in  the  ease  of  the  smallest  observation, 
the  naive  G  &  0  upper  limit  is  already  larger  than  either  or  the  bias  corrected  upper 
limits!  The  "con  cctcd*'  G  &  0  upper  limit  is  always  much  larger  than  that  of  any  of  the 
other  EB  methods,  reflecting  the  substantial  conservatism  embodied  in  this  procedure. 
Note  that  more  bias  correcting  (more  extreme  values  of  a',  hence  longer  confidence  in¬ 
tervals)  is  present  for  eases  having  shorter  history'  (smaller  r,)  -  observation  #5,  for  ex¬ 
ample.  This  jibes  with  our  intuition  about  EB  point  estimation:  that  cases  with  less 
information  have  more  uncertainty  associated  with  them  and  possibly  exhibit  more  ex¬ 
treme  shrinkage  patterns.  It  also  appears  that  conditioning  on  the  value  of  a  more 
highly  variable  X,  (like  observation  #5)  results  in  a  longer  interval  than  would  have  been 
obtained  had  only  unconditional  coverage  been  sought. 
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5.  CONCLUSION 


In  this  paper  we  have  described  a  sample  reuse  method  (the  Type  III  parametric 
bootstrap)  to  correct  the  bias  in  naive  empirical  Bayes  confidence  intervals  when  trac¬ 
table  distribution  theory  is  unavailable.  Through  data  examples  we  have  shown  that  this 
method  is  easily  implemented  yielding  intervals  which  retain  much  of  the  intuitive  appeal 
associated  with  PEB  and  shrinkage  estimation.  Future  effort  is  directed  at  more  general 
applications.  For  example,  the  conjugate  prior  assumption  may  be  dropped,  using  nu¬ 
merical  methods  to  evaluate  (2.6)  in  the  absence  of  a  convenient  form  for  the  posterior. 
Additionally  we  might  choose  to  bias  correct  intervals  based  on  the  marginal  posterior 
(1.1),  rather  than  the  naive  intervals  based  on  the  estimated  posterior  (1.2).  Beginning 
with  a  richer  class  of  intervals  could  lead  to  shorter  corrected  intervals  achieving  nominal 
coverage.  Finally,  efforts  to  unite  the  PEB  and  hierarchical  Bayesian  literature,  as  in 
Section  I  of  this  paper  as  well  as  Kass  and  Steffey  (I98S),  continue  to  be  important. 
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TABLE  1.  Pychological  Test  Data 


Subject 

tt, 

X 

r  m  0 

1 

A 

2 

.50 

2 

A 

2 

.50 

3 

A 

? 

.25 

4 

A 

0 

.00 

5 

A 

0 

.00 

6 

A 

0 

.00 

Source:  Burton  and  Turvey  ( 198S) 


TABLE  2.  Psychological  Text  Data  Analysis 


Subject 
»  r. 

90V.  Classical  Cl 

Lower  Upper  Length 

EB 

Pt  Est 

90%  Nai«  EB  Cl 

Lower  Upper  Length 

ng 

0.098 

O.902 

0.S05 

0.336 

0.112 

0.604 

0.491 

wm 

0.250 

0.013 

0.751 

0.739 

0.224 

0.047 

0.474 

Eaa 

HXW 

0.000 

0.527 

0.527 

0.112 

0.006 

0.315 

Subject 

90*/.  Unconditional  BCN  Cl 

90%  Conditional  BCN  Cl 

i 

Lower 

Upper 

Length 

Lower 

Upper 

Length 

1,2 

■iihkd 

0.582 

0.100 

■Sf'9 

3 

m'X'Ie* 

■ilVtl 

0.526 

0.047 

-ifl 

4,5,6 

0.006 

0.415 

0.409 

0.008 

0.447 

Note:  Hyperparameter  estimate  (MLE):  b  «  3.9296. 


TABLE  3.  Pump  Failure  Data 


System , 

X, 

t. 

r,  -  Dp-"-*' 

1 

5 

94.320 

.053 

2 

1 

15.720 

.064 

3 

5 

62.8S0 

.080 

4 

14 

125.760 

.111 

5 

3 

5.240 

.573 

6 

19 

31.440 

.604 

7 

1 

1.04S 

.954 

8 

1 

1.048 

.954 

9 

4 

2.096 

1.910 

10 

22 

10.4S0 

2.099 

Source:  Electric  Power  Research  Institute  Report 
(Worlcdgc,  Stringham,  and  McClymont  1982) 
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TABU-.  -4.  Pump  failure  Data  Analysis 


Observation:  I 
Classical 

'Naive*  G  &  0 

'Corrected*  G  &  0 

Naive  Ull 

Unconditional  BCN 
Conditional  BCN 


Observation:  5 
Classical 

'Naive*  G  &  0 

'Corrected*  G  &  0 

Naive  OB 

Unconditional  BCN 
Conditional  BCN 


Observation:  6 
Classical 

'Naive*  G  &  0 

'Corrected'  G  &  0 

Naive  OB 

Unconditional  BCN 
Conditional  BCN 


Observation:  10 
Classical 

'Naive*  G  &  0 

'Corrected'  G  &  0 

Naive  OB 


l*t  Hstimatc 

90%  Confidence  Interval 

.053 

.000 

3.09S 

.056 

.027 

.114 

* 

••• 

.125 

.061 

.026 

.107 

0 

.026 

.109 

(.0472) 

(.9533) 

«* 

.025 

.108 

(.0453) 

(.9507) 

.573 

.00-1 

4.032 

.512 

.207 

1.258 

••• 

1.553 

.5SS 

.195 

1.154 

- 

.I5S 

1.271 

(.0268) 

(.9705) 

0 

.163 

1.284 

(.0298) 

(.9721) 

.604 

.006 

4.087 

.577 

.395 

.861 

0 

... 

.905 

.606 

.401 

.846 

• 

.394 

.863 

(.0437) 

(.9594) 

0 

.398 

.863 

(.0471) 

(.9594) 

2.099 

.396 

6.444 

1.896 

1.343 

2.691 

0 

— 

2.945 

1.944 

1.327 

2.658 

0 

1.230 

2.735 

(.0250) 

(.9638) 

m 

1.311 

2.674 

(.0452) 

(.9532) 

Interval  l^nnth 


Unconditional  BCN 
Conditional  BCN 


1.363 
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20.  ABSTRACT 

j  *  • 

Parametric  empirical  Bayes  methods  of  point  estimation  for  a  vector  of 
unknown  parameters  date  to  the  landmark  paper  of  James  and  Stein  (1961).  The 
usual  approach  is  to  use  the  mean  of  the  estimated  posterior  distribution  of  each 
parameter,  where  the  estimation  of  the  prior  parameters  ("hyperparameters")  is 
accomplished  through  the  marginal  distribution  of  the  data.  While  point  estimates 
computed  this  way  usually  perform  well,  interval  estimates  based  on  the  estimated 
posterior  (called  "naive"  EB  intervals)  are  not.  khey  fail  to  account  for  the 
variability  in  the  estimation  of  the  hyperparameters,  generally  resulting  in 
sub-nominal  coverage  probability  in  the  "EB"  sense  defined  in  Morris  (1983a). 

In  this  paper  we  extend  the  work  of  Carlin  and  Gclfand  (1989),  who  proposed 
a  conditional  bias  correction  method  for  developing  EB  intervals  which  corrects 
the  deficiencies  in  the  naive  intervals.  We  show  how  bias  correction  can  be 
Implemented  in  general  via  a  Type  III  parametric  bootstrap  procedure,  a  sample 
reuse  method  first  employed  by  Laird  and  Louis  (1987).  V Theoretical  and  simula¬ 
tion  results  indicate  that  intervals  which  are  accurate  with  respect  to  nominal 
coverage  ensue.  We  give  two  specific  applications  (to  binomial  test  data  and 
Poisson  failure  rate  data)  where  we  compute  simultaneous  point  and  bins  corrected 
interval  estimates.  ' 
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